i=1 h n (ˆθ n ) = 0. (2)
|
|
- Shauna Parker
- 6 years ago
- Views:
Transcription
1 Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating equations. We are much less formal in this handout, merely giving a broad overview. Unlike in Geyer preprint) there is no no-n version of these asymptotics and as far as I can see there cannot be). Thus these asymptotics are based on the law of large numbers LLN) and the central limit theorem CLT), and n is sample size. The mathematics we explicitly show in this handout will be for independent and identically distributed IID) data. If one has non-iid data, then one must use an LLN or a CLT for such data. Suppose X 1, X 2,... are IID and gx, θ) is some continuously differentiable function of data and the the parameter that satisfies E θ {gx i, θ)} = 0, for all θ. 1) Write We seek estimators satisfying h n θ) = 1 n n gx i, θ). i=1 h n ˆθ n ) = 0. 2) If g is a vector-to-vector function, then so is h n and thus we say 2) are estimating equations plural), thinking of each component of 2) as one scalar equation. We say the estimating equations are unbiased if E θ {h n θ)} = 0, for all θ, 3) which follows from 1). The terminology is a bit different from the usual applications of unbiasedness. Clearly 3) says h n θ) is an unbiased estimator of zero if θ is the true unknown parameter value, but we usually don t think of random variables containing unknown parameters as estimators. Nevertheless, this is the accepted terminology for saying that 3) holds. 1
2 One application of unbiased estimating equations takes h n θ) = l n θ), where l n is the log likelihood. But this isn t a generalization of maximum likelihood; it is maximum likelihood. More applications will arrive in due course. In this handout we will assume we can expand the estimating equations in a Taylor series with negligible error 0 = h n ˆθ n ) = h n θ 0 ) + [ h n θ 0 ) ] ˆθ n θ 0 ) + o p n 1/2 ), 4) where θ 0 is the true unknown parameter value. In this handout are vague about how one might establish 4), which is not at all obvious. In Geyer preprint, Appendix C) a lot of work goes into establishing it from more easily verifiable assumptions. From the CLT we have where From the LLN we have where n 1/2 h n θ 0 ) w Normal0, V ), 5) V = var θ0 {gx i, θ 0 )}. h n θ 0 ) U = E θ0 { gx i, θ 0 )}. w U, 6) In the theory of maximum likelihood we have V = U by the second Bartlett identity Ferguson, 1996, p. 120). Here U need not even be a symmetric matrix and even if it is, there is nothing to make V = U hold and, in general, U V. We do assume that U and V have the same dimensions, so U is square and h n is a vector-to-vector function between vector spaces of the same dimension, and there are the same number of estimating equations as parameters in 2). This does not assure that solutions of the estimating equations exist, nor does it assure that solutions are unique if they exist, but uniqueness is impossible without as many estimating equations as parameters. We also assume that U and V are both nonsingular. We can rewrite 4) as n 1/2 ˆθ n θ 0 ) = [ h n θ 0 ) ] 1[ n 1/2 h n θ 0 ) ] + o p 1), from which by Slutsky s theorem, we get n 1/2 ˆθ n θ 0 ) 2 w U 1 Z,
3 where Z is a normal random vector having the distribution on the right side of 5). The distribution of U 1 Z is normal with mean zero and variance U 1 V U 1 ) T. Thus, under our assumptions, 4) plus U and V being nonsingular, we have n 1/2 w ˆθ n θ 0 ) Normal 0, U 1 V U 1 ) T ) 7) and that is the theory of unbiased estimating equations). Note that if we are doing maximum likelihood so U = V, we have U symmetric and U 1 V U 1 = V 1, and 7) gives the usual asymptotic distribution for the MLE, normal with mean zero and variance inverse Fisher information. But for general estimating equations U V and U U T, so the variance of the normal distribution in 7) does not simplify. The matrix U T V 1 U is sometimes called the Godambe information matrix because its specialization to the theory of maximum likelihood is U or V because U = V in maximum likelihood), which is the Fisher information matrix, and Godambe initiated the theory of unbiased estimating equations more than that, Godambe studied efficient unbiased estimating equations in which the Godambe information is as large as possible so the asymptotic variance of the estimator is as small as possible). We can say that the variance of the normal distribution in 7) is inverse Godambe information. Of course, U and V depend on the unknown parameter and hence are themselves unknown and must be estimated. Equation 6) suggests as a natural estimator of U, but U n = h n ˆθ n ) U n w U 8) does not follow from 6) and 7), so we will just add it as another assumption. The natural estimator of V is the empirical variance with the estimator plugged in V n = 1 n gx i, n ˆθ n )gx i, ˆθ n ) T, but i=1 V n w V 9) does not follow from 5) and 7), so we will just add it as another assumption. Then by Slutsky s theorem we have [ U 1 n V n Un 1 ) T ] 1/2[ n 1/2 ˆθ n θ 0 ) ] w Normal0, I), 3
4 which we write sloppily as ˆθ n Normal θ 0, n 1 Un 1 V n Un 1 ) T ). 10) Another name for the estimator Un 1 V n Un 1 ) T of the asymptotic variance is sandwich estimator think of U n as slices of bread and V n as ham). 2 Misspecified Maximum Likelihood One application of the theory of unbiased estimating equations is the theory of misspecified maximum likelihood, that is, maximum likelihood done when the model is wrong and the true unknown distribution of the data is not any distribution in the model. Let λ denote the Kullback-Leibler information function, defined by λθ) = E f log f ) θx), 11) fx) where f is the true unknown density of the data. Suppose λ achieves its maximum over the parameter space at some point θ 0 that is in the interior of the parameter space so λθ 0 ) = 0. Define gx, θ) = θ log f θ x). Assuming we can move a derivative inside the expectation in 11) we have 1) where E f {gx, θ 0 )} = 0, 12) and this is enough to get the theory of unbiased estimating equations going. Note that 12) is not quite the same as 1), but we do have 12) for all θ 0 that can arise as described corresponding to some true unknown distribution f). The theory of misspecified maximum likelihood is a bit odd in that we are not estimating the true unknown parameter value. There is no true unknown parameter value, because the true unknown distribution is not in our parametric model. We are estimating θ 0, which is the parameter value specifying the distribution in the model which is closest to the true unknown distribution in the sense of maximizing Kullback-Leibler information. We can say that even when the model is misspecified the MLE is a consistent and asymptotically normal estimator of θ 0 and the asymptotic variance 4
5 is inverse Godambe information estimated by the sandwich estimator). The only simplification that arises in this situation is that U = E f { 2 l n θ 0 )}, so U is a symmetric matrix and we can write UV 1 U for the Godambe information matrix and Un 1 V n Un 1 for the sandwich estimator omitting transposes), but U V when the model is misspecified so these do not simplify further. 3 Composite Likelihood It may come as a surprise to those who have no exposure to spatial statistics, but there are statistical models that are easily described but for which it is impossible in practice to evaluate the likelihood. Here is a simple example. The data for an Ising model are an r c matrix of two-valued random variables Y ij. For mathematical convenience, we let the values be 1 and +1, and also define Y ij = 0 for i and j outside the range of values of indices for this matrix. Consider two statistics t 1 Y ) = t 2 Y ) = r c i=1 j=1 r i=1 j=1 Y ij c ) Yij Y i,j+1 + Y ij Y i+1,j We may think of the matrix Y as a black and white image with 1 coding black pixels and +1 coding white pixels, in which case t 1 Y ) is the total number of white pixels minus the total number of black pixels and t 2 Y ) is the total number of concordant same color) neighbor pairs minus the total number of discordant different color) neighbor pairs, where pixels are neighbors if they are adjacent either horizontally or vertically. The Ising model is the full exponential family of distributions that has these two natural statistics and contains the distribution that makes all 2 rc possible data matrices equally likely. In theory, this model is very simple. Like all exponential families, it has log likelihood lθ) = ty ), θ cθ), 5
6 where cθ) = log e ty),θ, 13) y Y and where Y is the set of all 2 rc possible data matrices. The sum with 2 rc terms in 13) does not simplify by any known method and hence is undoable by any means other than brute force summation over all 2 rc terms, which is completely impractical when rc is more than 100, even if one had all the computers in the world harnessed to the task. The method of composite likelihood Lindsay, 1988) is a generalization of the method of pseudo-likelihood Besag, 1974, 1975), which was designed specifically to tackle problems like this. Varin, Reid, and Firth 2011) review the current state of composite likelihood theory and practice. Okabayashi, Johnson and Geyer 2011) apply composite likelihood to the Ising model. The general idea of composite likelihood is the following. Suppose we have a statistical model in which the likelihood is difficult to compute, which means we cannot compute the joint density for arbitrary values of the data and parameter. But suppose we can calculate some marginal or conditional densities derived from the joint density. Suppose we can calculate the conditional density of r k Y ) given s k Y ) for k = 1,..., m. If r k Y ) and s k Y ) are stochastically independent, for example, when s k is a constant function, then the conditional density of r k Y ) given s k Y ) is the same as the marginal density of r k Y ). Thus we can use the same notation for both conditional and marginal densities. Let f k,θ denote the conditional density of r k Y ) given s k Y ). Considered as a function of the parameter with the observed data plugged in, this is a log likelihood; it just isn t the log likelihood for the given statistical model. In particular, we have the first Bartlett identity E θ { θ log f k,θ rk Y ) s k Y ) )} = 0, for all θ. Since the expectation of a sum is the sum of the expectations, regardless of whether the terms are stochastically dependent or independent, we also have { m E θ θ log f k,θ rk Y ) s k Y ) )} = 0, for all θ. Thus k=1 m θ log f k,θ rk Y ) s k Y ) ) = 0 14) k=1 6
7 are unbiased estimating equations to be solved for an estimate ˆθ of the parameter. This leads us to define the function the left side of 14) is the derivative of m lθ) = log f k,θ rk Y ) s k Y ) ), 15) k=1 which is called a composite likelihood for the problem. Any maximizer of 15), necessarily a solution of the estimating equations 14), is called the maximum composite likelihood estimator MCLE). The method of pseudo-likelihood applied to the Ising model needs double subscripts, so we write r ij Y ) and s ij Y ) rather than r k Y ) and s k Y ). We take r ij Y ) = Y ij and we let s ij Y ) be the matrix that is the same as Y except that the i, j element is zero. Thus r ij Y ) is the data for the i, j pixel and s ij Y ) is the data for all other pixels. It is clear that since the log likelihood contains terms only involving single pixels and neighbor pairs of pixels, that the conditional distribution of Y ij given the rest of the data only involves the four pixels that are neighbors of the i, j pixel. Furthermore, since Y ij has only two possible values, normalizing its conditional distribution involves a sum with only two terms. Define Then f i,j,θ Yij s ij Y ) ) = X ij = Y i,j+1 + Y i,j 1 + Y i+1,j + Y i 1,j. expy ij [θ 1 + θ 2 X ij ]) expy ij [θ 1 + θ 2 X ij ]) + exp Y ij [θ 1 + θ 2 X ij ]) 16) The pseudo-likelihood is just the product of 16). Algebraically, it has the form of the log likelihood for a logistic regression. Since log f i,j,θ +1 sij Y ) ) f i,j,θ 1 sij Y ) ) = 2[θ 1 + θ 2 X ij ], we can estimate θ by doing a logistic regression with response vector having components 2Y ij 1 the Y matrix recoded to have values zero and one and strung out in a vector) and one non-constant predictor vector having components 2X ij and one constant predictor having components 2. Using composite likelihoods with r ij Y ) involving more than one component of Y is more complicated but doable Okabayashi, et al., 2011). References Besag, J. 1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36,
8 Besag, J. 1975). Statistical analysis of non-lattice data. Statistician, 24, Ferguson, T. S. 1996). A Course in Large Sample Theory. London: Chapman & Hall. Geyer, C. J. preprint). Asymptotics of maximum likelihood without the LLN or CLT or sample size going to infinity. Lindsay, B. 1988). Composite likelihood methods. Contemporary Mathematics, 80, Okabayashi, S., Johnson, L. and Geyer, C. J. 2011). Extending pseudolikelihood for Potts models. Statistica Sinica, 21, Varin, C., Reid, N. and Firth, D. 2011). An overview of composite likelihood methods. Statistica Sinica, 21,
5601 Notes: The Sandwich Estimator
560 Notes: The Sandwich Estimator Charles J. Geyer December 6, 2003 Contents Maximum Likelihood Estimation 2. Likelihood for One Observation................... 2.2 Likelihood for Many IID Observations...............
More informationAn exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.
Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLikelihood and p-value functions in the composite likelihood context
Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining
More informationComposite Likelihood Estimation
Composite Likelihood Estimation With application to spatial clustered data Cristiano Varin Wirtschaftsuniversität Wien April 29, 2016 Credits CV, Nancy M Reid and David Firth (2011). An overview of composite
More informationLikelihood Inference for Lattice Spatial Processes
Likelihood Inference for Lattice Spatial Processes Donghoh Kim November 30, 2004 Donghoh Kim 1/24 Go to 1234567891011121314151617 FULL Lattice Processes Model : The Ising Model (1925), The Potts Model
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationOptimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes
Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More information3 Independent and Identically Distributed 18
Stat 5421 Lecture Notes Exponential Families, Part I Charles J. Geyer April 4, 2016 Contents 1 Introduction 3 1.1 Definition............................. 3 1.2 Terminology............................ 4
More informationInformation in a Two-Stage Adaptive Optimal Design
Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for
More informationStat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More information11 Survival Analysis and Empirical Likelihood
11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with
More informationRegression #4: Properties of OLS Estimator (Part 2)
Regression #4: Properties of OLS Estimator (Part 2) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #4 1 / 24 Introduction In this lecture, we continue investigating properties associated
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationStat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation
Stat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation Ryan Martin www.math.uic.edu/~rgmartin Version: August 19, 2013 1 Introduction Previously we have discussed various properties of
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationReview of Discrete Probability (contd.)
Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability:
More informationAn exponential family of distributions is a parametric statistical model having log likelihood
Stat 8053 Lecture Notes Exponential Families Charles J. Geyer September 29, 2014 1 Exponential Families 1.1 Definition An exponential family of distributions is a parametric statistical model having log
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationLoglikelihood and Confidence Intervals
Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More information5.2 Fisher information and the Cramer-Rao bound
Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationp y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.
1. Suppose Y 1, Y 2,..., Y n is an iid sample from a Bernoulli(p) population distribution, where 0 < p < 1 is unknown. The population pmf is p y (1 p) 1 y, y = 0, 1 p Y (y p) = (a) Prove that Y is the
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More information,..., θ(2),..., θ(n)
Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.
More informationSampling distribution of GLM regression coefficients
Sampling distribution of GLM regression coefficients Patrick Breheny February 5 Patrick Breheny BST 760: Advanced Regression 1/20 Introduction So far, we ve discussed the basic properties of the score,
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationMaximum Likelihood Estimation
Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationTime Series Analysis Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 4.384 Time Series Analysis Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Indirect Inference 4.384 Time
More informationStat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota
Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationVerifying Regularity Conditions for Logit-Normal GLMM
Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationSTAT 135 Lab 3 Asymptotic MLE and the Method of Moments
STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,
More informationStat 5421 Lecture Notes Likelihood Inference Charles J. Geyer March 12, Likelihood
Stat 5421 Lecture Notes Likelihood Inference Charles J. Geyer March 12, 2016 1 Likelihood In statistics, the word likelihood has a technical meaning, given it by Fisher (1912). The likelihood for a statistical
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More information1 Appendix A: Matrix Algebra
Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationA Model for Correlated Paired Comparison Data
Working Paper Series, N. 15, December 2010 A Model for Correlated Paired Comparison Data Manuela Cattelan Department of Statistical Sciences University of Padua Italy Cristiano Varin Department of Statistics
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationStat 8501 Lecture Notes Spatial Lattice Processes Charles J. Geyer November 16, Introduction
Stat 8501 Lecture Notes Spatial Lattice Processes Charles J. Geyer November 16, 2016 1 Introduction A spatial lattice process is a stochastic process with a discrete carrier T, that is, we have random
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationAGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0
AGEC 661 ote Eleven Ximing Wu M-estimator So far we ve focused on linear models, where the estimators have a closed form solution. If the population model is nonlinear, the estimators often do not have
More informationStat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota
Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory Charles J. Geyer School of Statistics University of Minnesota 1 Asymptotic Approximation The last big subject in probability
More informationSTATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero
STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank
More informationStatistical inference
Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall
More informationA Conditional Approach to Modeling Multivariate Extremes
A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationAlgorithmic approaches to fitting ERG models
Ruth Hummel, Penn State University Mark Handcock, University of Washington David Hunter, Penn State University Research funded by Office of Naval Research Award No. N00014-08-1-1015 MURI meeting, April
More informationSpatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields
Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February
More informationRegression #3: Properties of OLS Estimator
Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with
More informationRegression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.
Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationV. Properties of estimators {Parts C, D & E in this file}
A. Definitions & Desiderata. model. estimator V. Properties of estimators {Parts C, D & E in this file}. sampling errors and sampling distribution 4. unbiasedness 5. low sampling variance 6. low mean squared
More informationLecture 10: Generalized likelihood ratio test
Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationRecap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1
Recap Vector observation: Y f (y; θ), Y Y R m, θ R d sample of independent vectors y 1,..., y n pairwise log-likelihood n m i=1 r=1 s>r w rs log f 2 (y ir, y is ; θ) weights are often 1 more generally,
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationStatistical Inference with Regression Analysis
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More information